Discriminative Method for Japanese Kana-Kanji Input Method
نویسندگان
چکیده
The most popular type of input method in Japan is kana-kanji conversion, conversion from a string of kana to a mixed kanjikana string. However there is no study using discriminative methods like structured SVMs for kana-kanji conversion. One of the reasons is that learning a discriminative model from a large data set is often intractable. However, due to progress of recent researches, large scale learning of discriminative models become feasible in these days. In the present paper, we investigate whether discriminative methods such as structured SVMs can improve the accuracy of kana-kanji conversion. To the best of our knowledge, this is the first study comparing a generative model and a discriminative model for kana-kanji conversion. An experiment revealed that a discriminative method can improve the performance by approximately 3%.
منابع مشابه
Large Scale Collocation Data and Their Application to Japanese Word Processor Technology
Word processors or computers used in Japan employ Japanese input method through keyboard stroke combined with Kana (phonetic) character to Kanji (ideographic, Chinese) character conversion technology. The key factor of Kana-to-Kanji conversion technology is how to raise the accuracy of the conversion through the homophone processing, since we have so many homophonic Kanjis. In this paper, we re...
متن کاملKeyboards for inputting Japanese language-arxiv
The most commonly used Japanese alphabets are Kanji, Hiragana and Katakana. The Kanji alphabet includes pictographs or ideographic characters that were adopted from the Chinese alphabet. Hiragana and Katakana are phonetic alphabets that do not include any characters common to each other or to Kanji. Hiragana is used to spell words of Japanese origin, while Katakana is used to spell words of wes...
متن کاملCandidate Display Styles in Japanese Input
Typing Japanese into computers consists of typing Roman alphabet, displaying the kana character, converting kana to kanji, and selecting the intended kanji character from a list of homophonic candidates. This paper presents a study of four candidate display styles, three commonly used in commercial products (“vertical,” “horizontal,” and “compact-horizontal”) and one novel (“matrix”), together ...
متن کاملRecent Topics in Speech Recognition Research at NTT Laboratories
This paper introduces three recent topics in speech recognition research at NTT (Nippon Telegraph and Telephone) Human Interface Laboratories. The first topic is a new HMM (hidden Markov model) technique that uses VQ-code bigrams to constrain the output probability distribution of the model according to the VQ-codes of previons frames. The output probability distribution changes depending on th...
متن کاملKana-Kanji Conversion System with Input Support Based on Prediction
1 I n t r o d u c t i o n TOSHIBA developed the world's first Japanese word processor in 1978. Unlike languages based on an alphabet , Japanese uses /,housands of Ica nji characters of varying comp]exity. Hence, l,o arrange all of l~a'~:ii chm'acl;ers on keyboard is; difficult. On the other hand, kana dlaracters which are phonetic scripl,s of Japanese have 83 variations; these can be arranged o...
متن کامل